Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning

نویسندگان

چکیده

We present a mean-variance policy iteration (MVPI) framework for risk-averse control in discounted infinite horizon MDP optimizing the variance of per-step reward random variable. MVPI enjoys great flexibility that any evaluation method and risk-neutral can be dropped off shelf, both on- off-policy settings. This reduces gap between is achieved by working on novel augmented directly. propose TD3 as an example instantiating MVPI, which outperforms vanilla many previous methods challenging Mujoco robot simulation tasks under risk-aware performance metric. first to introduce deterministic policies learning into reinforcement learning, are key boost we show domains.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement Learning Leads to Risk Averse Behavior

Animals and humans often have to choose between options with reward distributions that are initially unknown and can only be learned through experience. Recent experimental and theoretical work has demonstrated that such decision processes can be modeled using computational models of reinforcement learning (Daw et al, 2006; Erev & Barron, 2005; Sutton & Barto, 1998). In these models, agents use...

متن کامل

Preference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning

This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a “preference-based” approach to reinforcement learning is a possible extension of the type of feedback an agent may learn from. In particular, while conventional RL methods are essentially confined to deal with numeri...

متن کامل

Equilibrium in an ambiguity-averse mean-variance investors market

Keywords: Robust optimization Mean–variance portfolio theory Ellipsoidal uncertainty Equilibrium price system a b s t r a c t In a financial market composed of n risky assets and a riskless asset, where short sales are allowed and mean–variance investors can be ambiguity averse, i.e., diffident about mean return estimates where confidence is represented using ellipsoidal uncertainty sets, we de...

متن کامل

Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning

Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularl...

متن کامل

Bellman Gradient Iteration for Inverse Reinforcement Learning

This paper develops an inverse reinforcement learning algorithm aimed at recovering a reward function from the observed actions of an agent. We introduce a strategy to flexibly handle different types of actions with two approximations of the Bellman Optimality Equation, and a Bellman Gradient Iteration method to compute the gradient of the Qvalue with respect to the reward function. These metho...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i12.17302